{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "djUvWu41mtXa"
      },
      "source": [
        "##### Copyright 2019 The TensorFlow Authors."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {
        "cellView": "form",
        "id": "su2RaORHpReL"
      },
      "outputs": [],
      "source": [
        "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "# https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "NztQK2uFpXT-"
      },
      "source": [
        "# TensorBoard 性能分析: 在 Keras 中对基本训练指标进行性能分析\n",
        "\n",
        "<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://tensorflow.google.cn/tensorboard/tensorboard_profiling_keras\"><img src=\"https://tensorflow.google.cn/images/tf_logo_32px.png\" />在 TensorFlow.org 上查看</a>\n",
        "  </td>\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/docs-l10n/blob/master/site/zh-cn/tensorboard/tensorboard_profiling_keras.ipynb\"><img src=\"https://tensorflow.google.cn/images/colab_logo_32px.png\" />在 Google Colab 上运行</a>\n",
        "  </td>\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://github.com/tensorflow/docs-l10n/blob/master/site/zh-cn/tensorboard/tensorboard_profiling_keras.ipynb\"><img src=\"https://tensorflow.google.cn/images/GitHub-Mark-32px.png\" />在 GitHub 上查看源代码</a>\n",
        "  </td>\n",
        "  <td>\n",
        "    <a href=\"https://storage.googleapis.com/tensorflow_docs/docs-l10n/site/zh-cn/tensorboard/tensorboard_profiling_keras.ipynb\"><img src=\"https://tensorflow.google.cn/images/download_logo_32px.png\" />下载此 notebook</a>\n",
        "  </td>\n",
        "</table>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "eDXRFe_qp5C3"
      },
      "source": [
        "## 总览\n",
        "在机器学习中性能十分重要。TensorFlow 有一个内置的性能分析器可以使您不用费力记录每个操作的运行时间。然后您就可以在 TensorBoard 的 *Profile Plugin* 中对配置结果进行可视化。本教程侧重于 GPU ，但性能分析插件也可以按照[云 TPU 工具](https://cloud.google.com/tpu/docs/cloud-tpu-tools)来在 TPU 上使用。\n",
        "\n",
        "本教程提供了非常基础的示例以帮助您学习如何在开发 Keras 模型时启用性能分析器。您将学习如何使用 Keras TensorBoard 回调函数来可视化性能分析结果。“其他性能分析方式”中提到的 *Profiler API* 和 *Profiler Server* 允许您分析非 Keras TensorFlow 的任务。"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "dG-nnZK9qW9z"
      },
      "source": [
        "## 事先准备\n",
        "\n",
        "\n",
        "*   在你的本地机器上安装最新的[TensorBoard](https://tensorflow.google.cn/tensorboard)。\n",
        "\n",
        "*   在 Notebook 设置的加速器的下拉菜单中选择 “GPU”（假设您在Colab上运行此notebook）\n",
        "\n",
        ">![Notebook 设置](https://github.com/tensorflow/tensorboard/blob/master/docs/images/profiler-notebook-settings.png?raw=1)\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "DZhGh-G7KoKL"
      },
      "source": [
        "## 设置"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {
        "id": "3U5gdCw_nSG3"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "TensorFlow 2.x selected.\n"
          ]
        }
      ],
      "source": [
        "try:\n",
        "  # %tensorflow_version 只在 Colab 中存在。\n",
        "  %tensorflow_version 2.x\n",
        "except Exception:\n",
        "  pass\n",
        "\n",
        "# 加载 TensorBoard notebook 扩展。\n",
        "%load_ext tensorboard\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "1qIKtOBrqc9Y"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "TensorFlow version:  2.0.0-dev20190424\n"
          ]
        }
      ],
      "source": [
        "from __future__ import absolute_import\n",
        "from __future__ import division\n",
        "from __future__ import print_function\n",
        "\n",
        "from datetime import datetime\n",
        "from packaging import version\n",
        "\n",
        "import functools\n",
        "import tensorflow as tf\n",
        "import tensorflow_datasets as tfds\n",
        "from tensorflow.python.keras import backend\n",
        "from tensorflow.python.keras import layers\n",
        "\n",
        "import numpy as np\n",
        "\n",
        "print(\"TensorFlow version: \", tf.__version__)\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "8ZM-6NzYgPRn"
      },
      "source": [
        "确认 TensorFlow 可以看到 GPU。"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "gp2p-MemgAIh"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Found GPU at: /device:GPU:0\n"
          ]
        }
      ],
      "source": [
        "device_name = tf.test.gpu_device_name()\n",
        "if not tf.test.is_gpu_available():\n",
        "  raise SystemError('GPU device not found')\n",
        "print('Found GPU at: {}'.format(device_name))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6YDAoNCN3ZNS"
      },
      "source": [
        "## 使用 TensorBoard callback 运行一个简单的模型\n",
        "\n",
        "你将使用 Keras 来构建一个使用 ResNet56 (参考: [用于图像识别的深度残差学习](https://arxiv.org/abs/1512.03385))来分类[CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html)图像集的简单模型。\n",
        "\n",
        "从 [TensorFlow 模型园](https://github.com/tensorflow/models/blob/master/official/resnet/keras/resnet_cifar_model.py)复制 ResNet 模型代码。\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "ImCFrQ74eerE"
      },
      "outputs": [],
      "source": [
        "BATCH_NORM_DECAY = 0.997\n",
        "BATCH_NORM_EPSILON = 1e-5\n",
        "L2_WEIGHT_DECAY = 2e-4\n",
        "\n",
        "\n",
        "def identity_building_block(input_tensor,\n",
        "                            kernel_size,\n",
        "                            filters,\n",
        "                            stage,\n",
        "                            block,\n",
        "                            training=None):\n",
        " \n",
        "  \"\"\"标识块是一种在捷径上没有卷积层的块。\n",
        "\n",
        "  参数：\n",
        "    input_tensor：输入张量\n",
        "    kernel_size：默认为3，内核大小为\n",
        "        主路径上的中间卷积层\n",
        "    过滤器：整数列表，主路径上3个卷积层的过滤器\n",
        "    stage：整数，当前阶段标签，用于生成层名称\n",
        "    block：当前块标签，用于生成层名称\n",
        "    training：仅在使用 Estimator 训练 keras 模型时使用。 在其他情况下，它是自动处理的。\n",
        "\n",
        "  返回值：\n",
        "    输出块的张量。\n",
        "  \"\"\"\n",
        "  filters1, filters2 = filters\n",
        "  if tf.keras.backend.image_data_format() == 'channels_last':\n",
        "    bn_axis = 3\n",
        "  else:\n",
        "    bn_axis = 1\n",
        "  conv_name_base = 'res' + str(stage) + block + '_branch'\n",
        "  bn_name_base = 'bn' + str(stage) + block + '_branch'\n",
        "\n",
        "  x = tf.keras.layers.Conv2D(filters1, kernel_size,\n",
        "                             padding='same',\n",
        "                             kernel_initializer='he_normal',\n",
        "                             kernel_regularizer=\n",
        "                             tf.keras.regularizers.l2(L2_WEIGHT_DECAY),\n",
        "                             bias_regularizer=\n",
        "                             tf.keras.regularizers.l2(L2_WEIGHT_DECAY),\n",
        "                             name=conv_name_base + '2a')(input_tensor)\n",
        "  x = tf.keras.layers.BatchNormalization(axis=bn_axis,\n",
        "                                         name=bn_name_base + '2a',\n",
        "                                         momentum=BATCH_NORM_DECAY,\n",
        "                                         epsilon=BATCH_NORM_EPSILON)(\n",
        "                                             x, training=training)\n",
        "  x = tf.keras.layers.Activation('relu')(x)\n",
        "\n",
        "  x = tf.keras.layers.Conv2D(filters2, kernel_size,\n",
        "                             padding='same',\n",
        "                             kernel_initializer='he_normal',\n",
        "                             kernel_regularizer=\n",
        "                             tf.keras.regularizers.l2(L2_WEIGHT_DECAY),\n",
        "                             bias_regularizer=\n",
        "                             tf.keras.regularizers.l2(L2_WEIGHT_DECAY),\n",
        "                             name=conv_name_base + '2b')(x)\n",
        "  x = tf.keras.layers.BatchNormalization(axis=bn_axis,\n",
        "                                         name=bn_name_base + '2b',\n",
        "                                         momentum=BATCH_NORM_DECAY,\n",
        "                                         epsilon=BATCH_NORM_EPSILON)(\n",
        "                                             x, training=training)\n",
        "\n",
        "  x = tf.keras.layers.add([x, input_tensor])\n",
        "  x = tf.keras.layers.Activation('relu')(x)\n",
        "  return x\n",
        "\n",
        "\n",
        "def conv_building_block(input_tensor,\n",
        "                        kernel_size,\n",
        "                        filters,\n",
        "                        stage,\n",
        "                        block,\n",
        "                        strides=(2, 2),\n",
        "                        training=None):\n",
        "  \"\"\"在捷径中具有卷积层的块。\n",
        "\n",
        "  参数：\n",
        "    input_tensor：输入张量\n",
        "    kernel_size：默认为3，内核大小为\n",
        "        主路径上的中间卷积层\n",
        "    filters：整数列表，主路径上3个卷积层的过滤器\n",
        "    stage：整数，当前阶段标签，用于生成层名称\n",
        "    block：当前块标签，用于生成层名称\n",
        "    training：仅在使用 Estimator 训练 keras 模型时使用。在其他情况下，它是自动处理的。\n",
        "\n",
        "  返回值：\n",
        "    输出块的张量。\n",
        "\n",
        "  请注意，从第3阶段开始，\n",
        "  主路径上的第一个卷积层的步长=（2，2）\n",
        "  而且捷径的步长=（2，2）\n",
        "  \"\"\"\n",
        "  filters1, filters2 = filters\n",
        "  if tf.keras.backend.image_data_format() == 'channels_last':\n",
        "    bn_axis = 3\n",
        "  else:\n",
        "    bn_axis = 1\n",
        "  conv_name_base = 'res' + str(stage) + block + '_branch'\n",
        "  bn_name_base = 'bn' + str(stage) + block + '_branch'\n",
        "\n",
        "  x = tf.keras.layers.Conv2D(filters1, kernel_size, strides=strides,\n",
        "                             padding='same',\n",
        "                             kernel_initializer='he_normal',\n",
        "                             kernel_regularizer=\n",
        "                             tf.keras.regularizers.l2(L2_WEIGHT_DECAY),\n",
        "                             bias_regularizer=\n",
        "                             tf.keras.regularizers.l2(L2_WEIGHT_DECAY),\n",
        "                             name=conv_name_base + '2a')(input_tensor)\n",
        "  x = tf.keras.layers.BatchNormalization(axis=bn_axis,\n",
        "                                         name=bn_name_base + '2a',\n",
        "                                         momentum=BATCH_NORM_DECAY,\n",
        "                                         epsilon=BATCH_NORM_EPSILON)(\n",
        "                                             x, training=training)\n",
        "  x = tf.keras.layers.Activation('relu')(x)\n",
        "\n",
        "  x = tf.keras.layers.Conv2D(filters2, kernel_size, padding='same',\n",
        "                             kernel_initializer='he_normal',\n",
        "                             kernel_regularizer=\n",
        "                             tf.keras.regularizers.l2(L2_WEIGHT_DECAY),\n",
        "                             bias_regularizer=\n",
        "                             tf.keras.regularizers.l2(L2_WEIGHT_DECAY),\n",
        "                             name=conv_name_base + '2b')(x)\n",
        "  x = tf.keras.layers.BatchNormalization(axis=bn_axis,\n",
        "                                         name=bn_name_base + '2b',\n",
        "                                         momentum=BATCH_NORM_DECAY,\n",
        "                                         epsilon=BATCH_NORM_EPSILON)(\n",
        "                                             x, training=training)\n",
        "\n",
        "  shortcut = tf.keras.layers.Conv2D(filters2, (1, 1), strides=strides,\n",
        "                                    kernel_initializer='he_normal',\n",
        "                                    kernel_regularizer=\n",
        "                                    tf.keras.regularizers.l2(L2_WEIGHT_DECAY),\n",
        "                                    bias_regularizer=\n",
        "                                    tf.keras.regularizers.l2(L2_WEIGHT_DECAY),\n",
        "                                    name=conv_name_base + '1')(input_tensor)\n",
        "  shortcut = tf.keras.layers.BatchNormalization(\n",
        "      axis=bn_axis, name=bn_name_base + '1',\n",
        "      momentum=BATCH_NORM_DECAY, epsilon=BATCH_NORM_EPSILON)(\n",
        "          shortcut, training=training)\n",
        "\n",
        "  x = tf.keras.layers.add([x, shortcut])\n",
        "  x = tf.keras.layers.Activation('relu')(x)\n",
        "  return x\n",
        "\n",
        "\n",
        "def resnet_block(input_tensor,\n",
        "                 size,\n",
        "                 kernel_size,\n",
        "                 filters,\n",
        "                 stage,\n",
        "                 conv_strides=(2, 2),\n",
        "                 training=None):\n",
        "  \"\"\"一个应用层后跟多个标识块的块。\n",
        "\n",
        "  参数：\n",
        "    input_tensor：输入张量\n",
        "    size：整数，构成转化卷积/身份块的数量。\n",
        "    一个卷积层使用后，再跟（size-1）个身份块。\n",
        "    kernel_size：默认为3，内核大小为\n",
        "        主路径上的中间卷积层\n",
        "    filters：整数列表，主路径上3个卷积层的过滤器\n",
        "    stage：整数，当前阶段标签，用于生成层名称\n",
        "    conv_strides：块中第一个卷积层的步长。\n",
        "    training：仅在使用 Estimator 训练 keras 模型时使用。其他情况它会自动处理。  \n",
        "\n",
        "  返回值：\n",
        "    应用层和身份块后的输出张量。\n",
        "  \"\"\"\n",
        "\n",
        "  x = conv_building_block(input_tensor, kernel_size, filters, stage=stage,\n",
        "                          strides=conv_strides, block='block_0',\n",
        "                          training=training)\n",
        "  for i in range(size - 1):\n",
        "    x = identity_building_block(x, kernel_size, filters, stage=stage,\n",
        "                                block='block_%d' % (i + 1), training=training)\n",
        "  return x\n",
        "\n",
        "def resnet(num_blocks, classes=10, training=None):\n",
        "  \"\"\"实例化ResNet体系结构。\n",
        "\n",
        "  参数：\n",
        "    num_blocks：整数，每个块中的卷积/身份块的数量。\n",
        "      ResNet 包含3个块，每个块包含一个卷积块\n",
        "      后面跟着(layers_per_block - 1) 个身份块数。 每\n",
        "      卷积/理想度块具有2个卷积层。 用输入\n",
        "      卷积层和池化层至最后，这带来了\n",
        "      网络的总大小为（6 * num_blocks + 2）\n",
        "    classes：将图像分类为的可选类数\n",
        "    training：仅在使用 Estimator 训练 keras 模型时使用。其他情况下它会自动处理。\n",
        "\n",
        "  返回值：\n",
        "    Keras模型实例。\n",
        "  \"\"\"\n",
        "\n",
        "  input_shape = (32, 32, 3)\n",
        "  img_input = layers.Input(shape=input_shape)\n",
        "\n",
        "  if backend.image_data_format() == 'channels_first':\n",
        "    x = layers.Lambda(lambda x: backend.permute_dimensions(x, (0, 3, 1, 2)),\n",
        "                      name='transpose')(img_input)\n",
        "    bn_axis = 1\n",
        "  else:  # channel_last\n",
        "    x = img_input\n",
        "    bn_axis = 3\n",
        "\n",
        "  x = tf.keras.layers.ZeroPadding2D(padding=(1, 1), name='conv1_pad')(x)\n",
        "  x = tf.keras.layers.Conv2D(16, (3, 3),\n",
        "                             strides=(1, 1),\n",
        "                             padding='valid',\n",
        "                             kernel_initializer='he_normal',\n",
        "                             kernel_regularizer=\n",
        "                             tf.keras.regularizers.l2(L2_WEIGHT_DECAY),\n",
        "                             bias_regularizer=\n",
        "                             tf.keras.regularizers.l2(L2_WEIGHT_DECAY),\n",
        "                             name='conv1')(x)\n",
        "  x = tf.keras.layers.BatchNormalization(axis=bn_axis, name='bn_conv1',\n",
        "                                         momentum=BATCH_NORM_DECAY,\n",
        "                                         epsilon=BATCH_NORM_EPSILON)(\n",
        "                                             x, training=training)\n",
        "  x = tf.keras.layers.Activation('relu')(x)\n",
        "\n",
        "  x = resnet_block(x, size=num_blocks, kernel_size=3, filters=[16, 16],\n",
        "                   stage=2, conv_strides=(1, 1), training=training)\n",
        "\n",
        "  x = resnet_block(x, size=num_blocks, kernel_size=3, filters=[32, 32],\n",
        "                   stage=3, conv_strides=(2, 2), training=training)\n",
        "\n",
        "  x = resnet_block(x, size=num_blocks, kernel_size=3, filters=[64, 64],\n",
        "                   stage=4, conv_strides=(2, 2), training=training)\n",
        "\n",
        "  x = tf.keras.layers.GlobalAveragePooling2D(name='avg_pool')(x)\n",
        "  x = tf.keras.layers.Dense(classes, activation='softmax',\n",
        "                            kernel_initializer='he_normal',\n",
        "                            kernel_regularizer=\n",
        "                            tf.keras.regularizers.l2(L2_WEIGHT_DECAY),\n",
        "                            bias_regularizer=\n",
        "                            tf.keras.regularizers.l2(L2_WEIGHT_DECAY),\n",
        "                            name='fc10')(x)\n",
        "\n",
        "  inputs = img_input\n",
        "  # 创建模型\n",
        "  model = tf.keras.models.Model(inputs, x, name='resnet56')\n",
        "\n",
        "  return model\n",
        "\n",
        "\n",
        "resnet20 = functools.partial(resnet, num_blocks=3)\n",
        "resnet32 = functools.partial(resnet, num_blocks=5)\n",
        "resnet56 = functools.partial(resnet, num_blocks=9)\n",
        "resnet110 = functools.partial(resnet, num_blocks=18)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "1lAek-Lye8_q"
      },
      "source": [
        "从 [TensorFlow 数据集](https://tensorflow.google.cn/datasets)下载 CIFAR-10 数据集。"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "H8A67-bNXzsx"
      },
      "outputs": [],
      "source": [
        "cifar_builder = tfds.builder('cifar10')\n",
        "cifar_builder.download_and_prepare()\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "21jm6LOSq9EN"
      },
      "source": [
        "建立数据输入线性通信模型并编译 ResNet56 模型。"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "j-ryO6OxnQH_"
      },
      "outputs": [],
      "source": [
        "HEIGHT = 32\n",
        "WIDTH = 32\n",
        "NUM_CHANNELS = 3\n",
        "NUM_CLASSES = 10\n",
        "BATCH_SIZE = 128\n",
        "\n",
        "def preprocess_data(record):\n",
        "  image = record['image']\n",
        "  label = record['label']\n",
        "  \n",
        "  # 调整图像大小以在每侧增加四个额外的像素。\n",
        "  image = tf.image.resize_with_crop_or_pad(\n",
        "      image, HEIGHT + 8, WIDTH + 8)\n",
        "\n",
        "  # 随机裁剪图像的 [HEIGHT，WIDTH] 部分。\n",
        "  image = tf.image.random_crop(image, [HEIGHT, WIDTH, NUM_CHANNELS])\n",
        "\n",
        "  # 随机水平翻转图像。\n",
        "  image = tf.image.random_flip_left_right(image)\n",
        "\n",
        "  # 减去均值并除以像素方差。\n",
        "  image = tf.image.per_image_standardization(image)\n",
        "  \n",
        "  label = tf.compat.v1.sparse_to_dense(label, (NUM_CLASSES,), 1)\n",
        "  return image, label\n",
        "\n",
        "train_data = cifar_builder.as_dataset(split=tfds.Split.TRAIN)\n",
        "train_data = train_data.repeat()\n",
        "train_data = train_data.map(\n",
        "    lambda value: preprocess_data(value))\n",
        "train_data = train_data.shuffle(1024)\n",
        "\n",
        "train_data = train_data.batch(BATCH_SIZE)\n",
        "\n",
        "model = resnet56(classes=NUM_CLASSES)\n",
        "\n",
        "model.compile(optimizer='SGD',\n",
        "              loss='categorical_crossentropy',\n",
        "              metrics=['categorical_accuracy'])\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "_5llFQBKHFmA"
      },
      "source": [
        "当你创建 TensorBoard 回调时，您可以指定您想要进行性能分析的批次。默认情况下，TensorFlow 将对第二个批次进行性能分析，因为第一个批次的时候会运行很多一次性的图优化。您可以通过设置 `profile_batch` 对其进行修改。您还可以通过将其设置为 0 来关闭性能分析。\n",
        "\n",
        "这时候，您将会对第三批次进行性能分析。"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "WmY-2znGJxNY"
      },
      "outputs": [],
      "source": [
        "log_dir=\"logs/profile/\" + datetime.now().strftime(\"%Y%m%d-%H%M%S\")\n",
        "\n",
        "tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1, profile_batch = 3)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ylDhh7zlJ273"
      },
      "source": [
        "开始使用 [Model.fit()](https://https://tensorflow.google.cn/api_docs/python/tf/keras/models/Model#fit) 进行训练。"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "LEb_1HETJ_tX"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Epoch 1/5\n",
            " 1/20 [>.............................] - ETA: 14:27 - loss: 5.4251 - categorical_accuracy: 0.0859"
          ]
        },
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "W0425 21:14:50.396199 140078590396288 callbacks.py:238] Method (on_train_batch_end) is slow compared to the batch update (0.317050). Check your callbacks.\n"
          ]
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            " 2/20 [==>...........................] - ETA: 6:58 - loss: 5.5955 - categorical_accuracy: 0.0781 "
          ]
        },
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "W0425 21:14:50.954807 140078590396288 callbacks.py:238] Method (on_train_batch_end) is slow compared to the batch update (0.268180). Check your callbacks.\n"
          ]
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            " 3/20 [===>..........................] - ETA: 4:26 - loss: 5.7003 - categorical_accuracy: 0.0911"
          ]
        },
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "W0425 21:14:51.180765 140078590396288 callbacks.py:238] Method (on_train_batch_end) is slow compared to the batch update (0.134130). Check your callbacks.\n"
          ]
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "20/20 [==============================] - 51s 3s/step - loss: 5.3766 - categorical_accuracy: 0.1004\n",
            "Epoch 2/5\n",
            "20/20 [==============================] - 5s 227ms/step - loss: 4.8007 - categorical_accuracy: 0.0988\n",
            "Epoch 3/5\n",
            "20/20 [==============================] - 5s 242ms/step - loss: 4.3439 - categorical_accuracy: 0.0980\n",
            "Epoch 4/5\n",
            "20/20 [==============================] - 5s 247ms/step - loss: 3.9405 - categorical_accuracy: 0.1074\n",
            "Epoch 5/5\n",
            "20/20 [==============================] - 5s 225ms/step - loss: 3.6195 - categorical_accuracy: 0.1176\n"
          ]
        },
        {
          "data": {
            "text/plain": [
              "<tensorflow.python.keras.callbacks.History at 0x7f65f8758908>"
            ]
          },
          "execution_count": 12,
          "metadata": {
            "tags": []
          },
          "output_type": "execute_result"
        }
      ],
      "source": [
        "model.fit(train_data,\n",
        "          steps_per_epoch=20,\n",
        "          epochs=5, \n",
        "          callbacks=[tensorboard_callback])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "042k7GMERVkx"
      },
      "source": [
        "## 使用 TensorBoard 可视化性能分析结果\n",
        "\n",
        "不幸的是，由于[＃1913](https://github.com/tensorflow/tensorboard/issues/1913), 您无法在 Colab 中使用 TensorBoard 来可视化性能分析结果。您需要下载日志目录并在本地计算机上启动 TensorBoard。\n",
        "\n",
        "压缩下载日志:\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "6pck56gKReON"
      },
      "outputs": [],
      "source": [
        "!tar -zcvf logs.tar.gz logs/profile/"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "TZOf_K4L-Nkv"
      },
      "source": [
        "在“文件”选项卡中右键单击以下载 `logdir.tar.gz`。\n",
        "\n",
        "![下载](https://github.com/tensorflow/tensorboard/blob/master/docs/images/profiler-download-logdir.png?raw=1)\n",
        "\n",
        "请保证在你本地的机器安装最新的 [TensorBoard](https://tensorflow.google.cn/tensorboard)。在你的本地机器上执行下面的命令：\n",
        "\n",
        "```\n",
        "> cd download/directory\n",
        "> tar -zxvf logs.tar.gz\n",
        "> tensorboard --logdir=logs/ --port=6006\n",
        "\n",
        "```\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ciSIRibhRi6N"
      },
      "source": [
        "在您的Chrome浏览器中打开一个新标签，然后导航至[localhost：6006](http://localhost:6006)，单击 “Profile” 标签。您可能会看到以下性能分析结果：\n",
        "\n",
        "![跟踪视图](https://github.com/tensorflow/tensorboard/blob/master/docs/images/profiler-trace-viewer.png?raw=1)\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "roE94vH9mJ6k"
      },
      "source": [
        "## 跟踪查看器\n",
        "当您单击性能分析选项卡后，您将看到跟踪查看器。该页面显示了聚合期间 CPU 和加速器上发生的不同事件的时间轴。\n",
        "\n",
        "跟踪查看器在垂直轴上显示多个 *事件组*。 每个事件组都有多个水平 *跟踪*，其中填充了跟踪事件。*跟踪* 事件是在线程或 GPU 流上执行的基本时间线，。单个事件是时间轴轨道上的彩色矩形块。时间从左到右移动。\n",
        "\n",
        "您可以使用 `w`（放大），`s`（缩小），`a`（向左滚动），`d`（向右滚动）浏览结果。\n",
        "\n",
        "单个矩形代表 *跟踪事件* ：从这个时间的开始到结束时间。 要研究单个矩形，可以在浮动工具栏中选择鼠标光标图标后单击它。 这将显示有关矩形的信息，例如其开始时间和持续时间。\n",
        "\n",
        "除了点击之外，您还可以拖动鼠标以选择覆盖一组跟踪事件的矩形。这将为您提供与该矩形相交并汇总的事件列表。 `m` 键可用于测量所选事件的持续时间。\n",
        "\n",
        "![List of Events](https://github.com/tensorflow/tensorboard/blob/master/docs/images/profiler-trace-viewer-select.png?raw=1)\n",
        "\n",
        "跟踪事件是从三个来源收集的：\n",
        "\n",
        "\n",
        "*   **CPU:** CPU事件位于名为`/host:CPU`的事件组下。每个轨道代表 CPU 上的一个线程。例如，输入线性通信模型事件，GPU 操作调度事件， CPU 操作执行事件等。\n",
        "*   **GPU:**  GPU 事件位于以 `/device:GPU:`为前缀的事件组下。 除了 `stream:all`，每个事件组都代表在 GPU 上一个流。 `stream::all`将所有事件汇集到一个 GPU 上。例如。 内存复制事件，内核执行事件等。\n",
        "*   **TensorFlow 运行时间:** 运行时事件在以 `/job:`为前缀的事件组下。运行事件表示 python 程序调用的 TensorFlow ops。 例如， tf.function 执行事件等。\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "XAcO9sj4B2DK"
      },
      "source": [
        "## 调试性能\n",
        "现在，您将使用 Trace Viewer 来改善您的模型的性能。\n",
        "\n",
        "让我们回到刚刚捕获的分析结果。\n",
        "\n",
        "![GPU kernel](https://github.com/tensorflow/tensorboard/blob/master/docs/images/profiler-idle-gpu.png?raw=1)\n",
        "\n",
        "GPU 事件表明，GPU 在该步骤的上半部分什么都没有做。\n",
        "\n",
        "![CPU events](https://github.com/tensorflow/tensorboard/blob/master/docs/images/profiler-input-cpu.png?raw=1)\n",
        "\n",
        "CPU 事件表明，在此步骤的开始的时候，CPU 被数据输入管道占用。\n",
        "\n",
        "![Runtime](https://github.com/tensorflow/tensorboard/blob/master/docs/images/profiler-blocking-runtime.png?raw=1)\n",
        "\n",
        "在 TensorFlow 运行时中，有一个叫 `Iterator::GetNextSync`的大阻塞，这是从数据输入管道中获取下一批的阻塞调用。而且它阻碍了训练步骤。 因此，如果您可以在 `s-1` 的时候为 `s` 步骤准备输入数据，则可以更快地训练该模型。\n",
        "\n",
        "您也可以通过使用 [tf.data.prefetch](https://tensorflow.google.cn/api_docs/python/tf/data/Dataset#prefetch)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "JZ6UeYx9TT2T"
      },
      "outputs": [],
      "source": [
        "train_data = cifar_builder.as_dataset(split=tfds.Split.TRAIN)\n",
        "train_data = train_data.repeat()\n",
        "train_data = train_data.map(\n",
        "    lambda value: preprocess_data(value))\n",
        "train_data = train_data.shuffle(1024)\n",
        "train_data = train_data.batch(BATCH_SIZE)\n",
        "\n",
        "# 它将在（s-1）步骤中预取数据\n",
        "train_data = train_data.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "EfD6pnhgT7q3"
      },
      "source": [
        "重新运行模型。"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "tgFqaHYBUADP"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Epoch 1/5\n",
            "20/20 [==============================] - 5s 265ms/step - loss: 3.4081 - categorical_accuracy: 0.1055\n",
            "Epoch 2/5\n",
            "20/20 [==============================] - 4s 205ms/step - loss: 3.3122 - categorical_accuracy: 0.1141\n",
            "Epoch 3/5\n",
            "20/20 [==============================] - 4s 200ms/step - loss: 3.2795 - categorical_accuracy: 0.1199\n",
            "Epoch 4/5\n",
            "20/20 [==============================] - 4s 204ms/step - loss: 3.2237 - categorical_accuracy: 0.1469\n",
            "Epoch 5/5\n",
            "20/20 [==============================] - 4s 201ms/step - loss: 3.1888 - categorical_accuracy: 0.1465\n"
          ]
        },
        {
          "data": {
            "text/plain": [
              "<tensorflow.python.keras.callbacks.History at 0x7f65fbf87898>"
            ]
          },
          "execution_count": 14,
          "metadata": {
            "tags": []
          },
          "output_type": "execute_result"
        }
      ],
      "source": [
        "log_dir=\"logs/profile/\" + datetime.now().strftime(\"%Y%m%d-%H%M%S\")\n",
        "\n",
        "tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1, profile_batch = 3)\n",
        "\n",
        "model.fit(train_data,\n",
        "          steps_per_epoch=20,\n",
        "          epochs=5, \n",
        "          callbacks=[tensorboard_callback])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "LFtVDt-9UVkn"
      },
      "source": [
        "Woohoo! 你刚刚把训练性能从 *~235ms/step* 提高到 *~200ms/step*。"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "if5LuLl_pgna"
      },
      "outputs": [],
      "source": [
        "!tar -zcvf logs.tar.gz logs/profile/"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "aBBKSVJVp4yk"
      },
      "source": [
        "再一次下载 `logs` 目录来查看 TensorBoard的新的分析结果。\n",
        "\n",
        "![TF Runtime](https://github.com/tensorflow/tensorboard/blob/master/docs/images/profiler-prefetch-runtime.png?raw=1)\n",
        "\n",
        "`Iterator::GetNextSync`大阻塞不再存在。\n",
        "\n",
        "做得好！\n",
        "\n",
        "显然，这依旧不是最佳性能。请自己尝试，看看是否可以有更多的改进。\n",
        "\n",
        "有关性能调整的一些有用参考：\n",
        "\n",
        "\n",
        "*   [数据输入线性通信模型](https://tensorflow.google.cn/guide/data_performance)\n",
        "*   [训练表现: 更快收敛的用户指南 (TensorFlow Dev Summit 2018)](https://www.youtube.com/watch?v=SxOsJPaxHME)\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "pLfa4vMn626q"
      },
      "source": [
        "## 其他分析方式\n",
        "除了 TensorBoard 回调外，TensorFlow 还提供了其他两种方式来手动触发分析器：*Profiler APIs* 和 *Profiler Service*。\n",
        "\n",
        "注意：请不要同时运行多个分析器。如果您想将 Profiler API 或 Profiler Service 与 TensorBoard 回调一起使用，请确保将`profile_batch` 参数设置为0。\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "gt9Dm8PkL1FI"
      },
      "source": [
        "### Profiler APIs"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "VYywGzC2GQ8w"
      },
      "outputs": [],
      "source": [
        "# 内容管理接口\n",
        "with tf.python.eager.profiler.Profiler('logdir_path'):\n",
        "  # 进行你的训练\n",
        "  pass\n",
        "\n",
        "\n",
        "# 功能接口\n",
        "tf.python.eager.profiler.start()\n",
        "# 进行你的训练\n",
        "profiler_result = tf.python.eager.profiler.stop()\n",
        "tf.python.eager.profiler.save('logdir_path', profiler_result)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "NSHEq0rIHHBs"
      },
      "source": [
        "### Profiler Service\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "USTAe02KHcql"
      },
      "outputs": [],
      "source": [
        "# 此 API 将在您的 TensorFlow 作业上启动 gRPC 服务器，该 API 可以按需接收分析请求。\n",
        "tf.python.eager.profiler.start_profiler_server(6009)\n",
        "\n",
        "# 在这里写你的 TensorFlow 项目"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "AgIro3xQIXUa"
      },
      "source": [
        "然后，您可以单击 “Capture Profile” 按钮将性能分析请求发送到 Profiler 服务器以在 TensorBoard 上执行按需分析：\n",
        "\n",
        "![CAPTURE PROFILE](https://github.com/tensorflow/tensorboard/blob/master/docs/images/profiler-capture.png?raw=1)\n",
        "\n",
        "成功捕获后将显示一条消息。 然后，您可以刷新TensorBoard来获得可视化结果。"
      ]
    }
  ],
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "collapsed_sections": [],
      "name": "tensorboard_profiling_keras.ipynb",
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
